New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use timely's logging infrastructure to log Tracker state #321
Conversation
Add TrackerEvent which records additons or removals of capabilities, as well as propagation events when chanages in implications are propagated along the internal connections and edges of the graph. Add DebugEvent which records the state of pointstamps, implications, and worklist of Tracker. Enabling loggers for these events is done the same way as for logging::TimelyEvent's.
e3908d5
to
6db0fbf
Compare
timely/src/progress/frontier.rs
Outdated
@@ -317,6 +317,13 @@ impl<T: PartialOrder+Ord+Clone> MutableAntichain<T> { | |||
self.frontier().less_equal(time) | |||
} | |||
|
|||
/// Clones the vector of updates. | |||
/// Only used for debugging purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say a bit more about what this is used for in the comments? If this lands, I'll need to support it, and it would help to understand why it is here and under which circumstances it could go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the function as it's no longer needed.
timely/src/progress/frontier.rs
Outdated
/// Clones the vector of updates. | ||
/// Only used for debugging purposes. | ||
#[inline] | ||
pub fn updates(&self) -> Vec<(T, i64)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this return a &[(T, i64)]
instead, to avoid a mandatory allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the function completely as it's no longer needed.
timely/src/progress/reachability.rs
Outdated
@@ -145,8 +147,9 @@ pub struct Builder<T: Timestamp> { | |||
impl<T: Timestamp> Builder<T> { | |||
|
|||
/// Create a new empty topology builder. | |||
pub fn new() -> Self { | |||
pub fn new(path: Vec<usize>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a big abstraction change, in that reachability.rs
used to be agnostic to the hierarchical nature of names in timely dataflow, and just did reachability tracking in a scope with no additional information. This seems to bake that in now, which .. will have to ponder whether that is a good call or not.
Another option might be to produce another implementation of reachability.rs
, putting things behind a trait.
Not trying to be difficult, but attempting to minimize the complexity in an already-too-complicated bit of logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I removed the path argument from from Tracker and Builder constructor.
However, it still must be supplied when the logger is registered. The logger now looks like this:
pub tracker_logger: Option<(Vec<usize>, crate::logging::Logger<TrackerEvent>)>,
The logging events such as "a capability is added/removed from a location" do not make sense if they cannot be tracked down to a particular location.
Let me know what you think please and whether you would still prefer going your way.
I have some general questions about the PR! I think I understand the goal, which is to get out information about the steps that Would it serve your purposes just as well to instrument the moments and nature of pointstamp updates and propagation? |
Thank you for the comments! I changed the logging so there is only one additional logger (as apposed to two) logging the minimum information to re-execute the code as you suggested. |
9e77029
to
9d66121
Compare
Hi folks, a couple of more notes from a conversation with @saradecova . We're using This seems reasonable (we wouldn't know how to encode the type otherwise), however this makes it harder (impossible) for the consumer to know what type to expect for a certain scope. As an example, to be able to replay the behaviour of the We're wondering if it makes sense to add (or adjust an existing) One option would be to use I'm still considering options, but @frankmcsherry let us know if you have opinions. |
To follow on @utaal and our conversation, we can encode the type in string using Moreover, by adding In pracrise, this could be an event informing us that "A new subscope was created at address
|
Also, it may be the right time to address these todo(s) in timely-dataflow/timely/src/progress/broadcast.rs Lines 68 to 69 in 06fac10
timely-dataflow/timely/src/progress/broadcast.rs Lines 118 to 119 in 06fac10
|
Loggs an event whenever a new instance of a Subgraph is created.
I'm back to looking at this. Very sorry for the delay. I have several spot comments, and generally think that before landing durably in timely it needs a bit more design work. In particular,
I still don't have a great read on the requirements here. I apologize if my comments have been confused. My understanding is that you want to be able to extract progress information from the reachability subsystem, and I'm guessing that is to drive your work on progress stuffs. Do you need the fine-grained update information, or just the aggregate information extracted in I'm currently trying to reconcile this with requests other information asks about progress tracking, that I think are more about "log the state of the dataflow-wide frontier". It probably relates, and definitely has the same awkwardness around timestamps being generic. Anyhow, I'm thinking about this now, and trying to understand which things are important to log and which are optional! |
I have another ask: is there a qualitative difference for you between logging the progress updates in EDIT: It also has the nice property that they can be logged transactionally, all at the same timestamp, which may avoid transient weirdness for folks looking at the data (I don't believe we work hard to fold in positive updates before negative updates). This looks like a good direction to go to expose information about the state of system progress, but I bet my requirements are not your requirements (e.g. that change above is fine for me, but I don't want to do it if it breaks your reqs). There are some stray lines added and removed; I can fix these up. Mostly I'm trying to map out "imagine this lands; what 'improvements' should be prohibited?" |
// Perhaps log information about the creation of subgraph. | ||
if let Some(l) = self.logging.as_mut() { | ||
l.log(crate::logging::SubgraphEvent{ | ||
id: worker.index(), | ||
addr: path.clone(), | ||
timestamp_type: std::any::type_name::<TInner>().to_string(), | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to reframe this as a TrackerEvent
and have it be part of the line just up above (i.e. "tracker came in to existence"). I suspect something like Tracker::install_logger(...)
could do both of those things and wrap up the abstraction well. I'm happy to do that after the fact if that works for you.
// double-check that child 0 (the outside world) is correctly shaped. | ||
assert_eq!(self.children[0].outputs, self.inputs()); | ||
assert_eq!(self.children[0].inputs, self.outputs()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why these moved down 20 lines?
let (internal_summary, _) = operator.get_internal_summary(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If at all possible, I'd like to keep this next to the set_external_summary()
call just to be clear that they are paired. I'm happy to have it hoisted as well.
/// Internal summary for every combination of input and output port. | ||
pub internal_summaries: Vec<Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what these are used for? Would it be equally beneficial to have the report from the tracker about its input-to-output summaries? That would leave this event stable and consolidate the timestamp/summary related events to the reachability tracker.
@@ -597,7 +629,6 @@ impl<T:Timestamp> Tracker<T> { | |||
// will discover zero-change times when we first visit them, as no further | |||
// changes can be made to them once we complete them. | |||
while let Some(Reverse((time, location, mut diff))) = self.worklist.pop() { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random whitespace
@@ -654,6 +685,7 @@ impl<T:Timestamp> Tracker<T> { | |||
}; | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random whitespace
Note to self: this would be great pointed at the new logging channel now introduced in #352. |
Based on PR TimelyDataflow#321 by @saradecova
Closing in favor of #375 which borrows heavily from this. If it turns out that there is an urgent need for e.g. logging the topology information, I can certainly make that happen too. |
Add TrackerEvent which records additions or removals
of capabilities, as well as propagation events when changes
in implications are propagated along the internal connections
and edges of the graph.
Add DebugEvent which records the state of pointstamps,
implications, and worklist of Tracker.
Enabling loggers for these events is done the same way as for
logging::TimelyEvent's
At the moment, we use these loggers for comparative testing between
Isabelle implementation of progress tracking and our Rust implementation.
This graph plots the runtime of the computation with and without the changes, and suggests
that the increase is not significant despite the changes being on the critical path.
The example used is timely/examples/barrier.rs with 10,000,000 samples, run
with four workers. The graph shows complementary cdf for:
We got similar results when run on other examples.